Estimating Commit Sizes Efficiently

نویسندگان

  • Philipp Hofmann
  • Dirk Riehle
چکیده

The quantitative analysis of software projects can provide insights that let us better understand open source and other software development projects. An important variable used in the analysis of software projects is the amount of work being contributed, the commit size. Unfortunately, post-facto, the commit size can only be estimated, not measured. This paper presents several algorithms for estimating the commit size. Our performance evaluation shows that simple, straightforward heuristics are superior to the more complex text-analysis-based algorithms. Not only are the heuristics significantly faster to compute, they also deliver more accurate results when estimating commit sizes. Based on this experience, we design and present an algorithm that improves on the heuristics, can be computed equally fast, and is more accurate than any of the prior approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting the number of signals for an undamped exponential model using cross-validation approach

Detecting the number of signals and estimating the parameters of the signals are important problems in Statistical Signal Processing. Quite a number of papers appeared in the last twenty years in estimating the parameters of an exponential signal quite efficiently but not that much of attention has been paid in estimating the number of signals of an exponential signal model. Recently it is obse...

متن کامل

PoooL: an efficient method for estimating haplotype frequencies from large DNA pools

MOTIVATION Pooling DNA is a cost-effective alternative to individual genotyping method. It is often used for initial screening in genome-wide association analysis. In some studies, large pools with sizes up to several hundreds were applied in order to significantly reduce genotyping cost. However, method for estimating haplotype frequencies from large DNA pools has not been available due to com...

متن کامل

A Model of the Commit Size Distribution of Open Source

A fundamental unit of work in programming is the code contribution (“commit”) that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate th...

متن کامل

Performance of Short-Commit in Extreme Database Environment

Muhammad Rizwan [email protected] Department of Computer Engineering University of Engineering and Technology Taxila, Pakistan Abstract: Atomic commit protocols are used where data integrity is more important than data availability. Two-Phase commit (2PC) is a standard commit protocol for commercial database management systems. To reduce certain drawbacks in 2PC protocol people h...

متن کامل

SPH simulations of grain growth in protoplanetary disks

Aims. In order to understand the first stages of planet formation, when tiny grains aggregate to form planetesimals, one needs to simultaneously model grain growth, vertical settling and radial migration of dust in protoplanetary disks. In this study, we implement an analytical prescription for grain growth into a 3D two-phase hydrodynamics code to understand its effects on the dust distributio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009